Search CORE

A statistical method for predicting splice variants between two groups of samples using GeneChip(® )expression array data

Author: Fan Wenhong
Hallahan Andrew R
Khalid Najma
Olson James M
Zhao Lue Ping
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip(® )that uses multiple oligonucleotide probes (i.e. probe set), since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip(® )was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip(® )gene expression array data. RESULTS: We developed a two-step approach to predict alternative splicing from GeneChip(® )data. First, we clustered the probes from a probe set into pseudo-exons based on similarity of probe intensities and physical adjacency. A pseudo-exon is defined as a sequence in the gene within which multiple probes have comparable probe intensity values. Second, for each pseudo-exon, we assessed the statistical significance of the difference in probe intensity between two groups of samples. Differentially expressed pseudo-exons are predicted to be alternatively spliced. We applied our method to empirical data generated from GeneChip(® )Hu6800 arrays, which include 7129 probe sets and twenty probes per probe set. The dataset consists of sixty-nine medulloblastoma (27 metastatic and 42 non-metastatic) samples and four cerebellum samples as normal controls. We predicted that 577 genes would be alternatively spliced when we compared normal cerebellum samples to medulloblastomas, and predicted that thirteen genes would be alternatively spliced when we compared metastatic medulloblastomas to non-metastatic ones. We checked the consistency of some of our findings with information in UCSC Human Genome Browser. CONCLUSION: The two-step approach described in this paper is capable of predicting some alternative splicing from multiple oligonucleotide-based gene expression array data with GeneChip(® )technology. Our method employs the extensive repositories of gene expression array data available and generates alternative splicing hypotheses, which can be further validated by experimental studies

University of Queensland eSpace

A class of models for analyzing GeneChip(® )gene expression analysis array data

Author: Fan Wenhong
Khalid Najma
Olson James M
Pritchard Joel I
Zhao Lue Ping
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Various analytical methods exist that first quantify gene expression and then analyze differentially expressed genes from Affymetrix GeneChip(® )gene expression analysis array data. These methods differ in the choice of probe measure (quantification of probe hybridization), summarizing multiple probe intensities into a gene expression value, and analysis of differential gene expression. Research papers that describe these methods focus on performance, and how their approaches differ from others. To better understand the common features and differences between various methods, and to evaluate their impact on the results of gene expression analysis, we describe a class of models, referred to as generalized probe models (GPMs), which encompass various currently available methods. RESULTS: Using an empirical dataset, we compared different formulations of GPMs, and GPMs with three other commonly used methods, i.e. MAS 5.0, dChip, and RMA. The comparison shows that, on a genome-wide scale , different methods yield similar results if the same probe measures are chosen. CONCLUSION: In this paper we present a general framework, i.e. GPMs, which encompasses various methods. GPMs permit the use of a wide range of probe measures and facilitate appropriate comparison between commonly used methods. We demonstrate that the dissimilar results stem primarily from different choice of probe measures, rather than other factors

Accurate, precise modeling of cell proliferation kinetics from time-lapse imaging and automated image analysis of agar yeast culture arrays

Author: Hartman John L
Laws Richard J
Shah Najaf A
Wardman Bradley
Zhao Lue Ping
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Genome-wide mutant strain collections have increased demand for high throughput cellular phenotyping (HTCP). For example, investigators use HTCP to investigate interactions between gene deletion mutations and additional chemical or genetic perturbations by assessing differences in cell proliferation among the collection of 5000 S. cerevisiae gene deletion strains. Such studies have thus far been predominantly qualitative, using agar cell arrays to subjectively score growth differences. Quantitative systems level analysis of gene interactions would be enabled by more precise HTCP methods, such as kinetic analysis of cell proliferation in liquid culture by optical density. However, requirements for processing liquid cultures make them relatively cumbersome and low throughput compared to agar. To improve HTCP performance and advance capabilities for quantifying interactions, YeastXtract software was developed for automated analysis of cell array images. RESULTS: YeastXtract software was developed for kinetic growth curve analysis of spotted agar cultures. The accuracy and precision for image analysis of agar culture arrays was comparable to OD measurements of liquid cultures. Using YeastXtract, image intensity vs. biomass of spot cultures was linearly correlated over two orders of magnitude. Thus cell proliferation could be measured over about seven generations, including four to five generations of relatively constant exponential phase growth. Spot area normalization reduced the variation in measurements of total growth efficiency. A growth model, based on the logistic function, increased precision and accuracy of maximum specific rate measurements, compared to empirical methods. The logistic function model was also more robust against data sparseness, meaning that less data was required to obtain accurate, precise, quantitative growth phenotypes. CONCLUSION: Microbial cultures spotted onto agar media are widely used for genotype-phenotype analysis, however quantitative HTCP methods capable of measuring kinetic growth rates have not been available previously. YeastXtract provides objective, automated, quantitative, image analysis of agar cell culture arrays. Fitting the resulting data to a logistic equation-based growth model yields robust, accurate growth rate information. These methods allow the incorporation of imaging and automated image analysis of cell arrays, grown on solid agar media, into HTCP-driven experimental approaches, such as global, quantitative analysis of gene interaction networks

A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system

Author: Boulin Hou
Jiajia Hu
Ligang Luo
Liping Li
Lue Ping Zhao
Tianze Zhang
Xiaozhe Wang
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Sequencing genes in silico using single nucleotide polymorphisms

Author: Hansen John A
Huang Xin
Li Shuying Sue
Zhang Bo
Zhang Xinyi Cindy
Zhao Lue Ping
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs) discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive. Results To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS), which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles) at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%). This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC) Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes. Conclusions Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate genes for more detailed functional and mechanistic studies.</p

Crossref

Gene expression profiling identifies genes predictive of oral squamous cell carcinoma

Author: Chen Chu
Doody Dave
Fan Wenhong
Farwell D. Gregory
Futran Neal D.
Houck John
Lohavanichbutr Pawadee
Méndez Eduardo
Schwartz Stephen M.
Upton Melissa
Yueh Bevan
Zhao Lue Ping
Publication venue: The American Association for Cancer Research
Publication date: 31/07/2008
Field of study

Oral squamous cell carcinoma (OSCC) is associated with substantial mortality and morbidity. To identify potential biomarkers for early detection of invasive OSCC, we compared gene expression of incident primary OSCC, oral dysplasia, and clinically normal oral tissue from surgical patients without head and neck cancer or pre-neoplastic oral lesions (controls), using Affymetrix U133 2.0 Plus arrays. We identified 131 differentially expressed probe sets using a training set of 119 OSCC patients and 35 controls. Forward and stepwise logistic regression analyses identified 10 successive combinations of genes which expression differentiated OSCC from controls. The best model included LAMC2, encoding laminin gamma 2 chain, and COL4A1, encoding collagen, type IV, alpha 1 chain. Subsequent modeling without these two markers showed that COL1A1, encoding collagen, type I, alpha 1 chain, and PADI1, encoding peptidyl arginine deiminase, type 1, also can distinguish OSCC from controls. We validated these two models using an internal independent testing set of 48 invasive OSCC and 10 controls and an external testing set of 42 head and neck squamous cell carcinoma (HNSCC) cases and 14 controls (GEO GSE6791), with sensitivity and specificity above 95%. These two models were also able to distinguish dysplasia (n=17) from control (n=35) tissue. Differential expression of these four genes was confirmed by qRT-PCR. If confirmed in larger studies, the proposed models may hold promise for monitoring local recurrence at surgical margins and the development of second primary oral cancer in OSCC patients

authors@Fred Hutch

Genomewide gene expression profiles of HPV-positive and HPV-negative oropharyngeal cancer: potential implications for treatment choices.

Author: Chen Chu
Doody David R
Fan Wenhong
Farwell D Gregory
Futran Neal
Houck John
Lohavanichbutr Pawadee
Mendez Eduardo
Schwartz Stephen M
Upton Melissa P
Yueh Bevan
Zhao Lue Ping
Publication venue: 'American Medical Association (AMA)'
Publication date: 01/02/2009
Field of study

OBJECTIVE: To study the difference in gene expression between human papillomavirus (HPV)-positive and HPV-negative oral cavity and oropharyngeal squamous cell carcinoma (OSCC). DESIGN: We used Affymetrix U133 plus 2.0 arrays to examine gene expression profiles of OSCC and normal oral tissue. The HPV DNA was detected using polymerase chain reaction followed by the Roche LINEAR ARRAY HPV Genotyping Test, and the differentially expressed genes were analyzed to examine their potential biological roles using the Ingenuity Pathway Analysis Software, version 5.0. SETTING: Three medical centers affiliated with the University of Washington. PATIENTS: A total of 119 patients with primary OSCC and 35 patients without cancer, all of whom were treated at the setting institutions, provided tissues samples for the study. RESULTS: Human papillomavirus DNA was found in 41 of 119 tumors (34.5%) and 2 of 35 normal tissue samples (5.7%); 39 of the 43 HPV specimens were HPV-16. A higher prevalence of HPV DNA was found in oropharyngeal cancer (23 of 31) than in oral cavity cancer (18 of 88). We found no significant difference in gene expression between HPV-positive and HPV-negative oral cavity cancer but found 446 probe sets (347 known genes) differentially expressed in HPV-positive oropharyngeal cancer than in HPV-negative oropharyngeal cancer. The most prominent functions of these genes are DNA replication, DNA repair, and cell cycling. Some genes differentially expressed between HPV-positive and HPV-negative oropharyngeal cancer (eg, TYMS, STMN1, CCND1, and RBBP4) are involved in chemotherapy or radiation sensitivity. CONCLUSION: These results suggest that differences in the biology of HPV-positive and HPV-negative oropharyngeal cancer may have implications for the management of patients with these different tumors

authors@Fred Hutch

Public Library of Science (PLOS)

Genetic Variation of the Human Urinary Tract Innate Immune Response and Asymptomatic Bacteriuria in Women

Author: Alan Aderem
Ann E. Stapleton
Delia Scholes
Derya Unutmaz
Hongwei Wang
Lue Ping Zhao
Marta Janer
Sue S. Li
Thomas M. Hooton
Thomas R. Hawn
Walter E. Stamm
Publication venue: Public Library of Science
Publication date: 01/12/2009
Field of study

BACKGROUND:Although several studies suggest that genetic factors are associated with human UTI susceptibility, the role of DNA variation in regulating early in vivo urine inflammatory responses has not been fully examined. We examined whether candidate gene polymorphisms were associated with altered urine inflammatory profiles in asymptomatic women with or without bacteriuria. METHODOLOGY:We conducted a cross-sectional analysis of asymptomatic bacteriuria (ASB) in 1,261 asymptomatic women ages 18-49 years originally enrolled as participants in a population-based case-control study of recurrent UTI and pyelonephritis. We genotyped polymorphisms in CXCR1, CXCR2, TLR1, TLR2, TLR4, TLR5, and TIRAP in women with and without ASB. We collected urine samples and measured levels of uropathogenic bacteria, neutrophils, and chemokines. PRINCIPAL FINDINGS:Polymorphism TLR2_G2258A, a variant associated with decreased lipopeptide-induced signaling, was associated with increased ASB risk (odds ratio 3.44, 95%CI; 1.65-7.17). Three CXCR1 polymorphisms were associated with ASB caused by gram-positive organisms. ASB was associated with urinary CXCL-8 levels, but not CXCL-5, CXCL-6, or sICAM-1 (P< or =0.0001). Urinary levels of CXCL-8 and CXCL-6, but not ICAM-1, were associated with higher neutrophil levels (P< or =0.0001). In addition, polymorphism CXCR1_G827C was associated with increased CXCL-8 levels in women with ASB (P = 0.004). CONCLUSIONS:TLR2 and CXCR1 polymorphisms were associated with ASB and a CXCR1 variant was associated with urine CXCL-8 levels. These results suggest that genetic factors are associated with early in vivo human bladder immune responses prior to the development of symptomatic UTIs

Crossref